Search for: All records

Creators/Authors contains: "Keung, Albert"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Yeast Surface Display of Protein Addresses Confers Robust Storage and Access of DNA-Based Data

https://doi.org/10.3390/dna5030034

Lee, Magdelene N; Brihadiswaran, Gunavaran; Rao, Balaji M; Tuck, James M; Keung, Albert J (September 2025, DNA)

Background/Objectives: The potential of DNA as an information-dense storage medium has inspired a broad spectrum of creative systems. In particular, hybrid biomolecular systems that integrate new materials and chemistries with DNA could drive novel functions. In this work, we explore the potential for proteins to serve as molecular file addresses. We stored DNA-encoded data in yeast and leveraged yeast surface display to readily produce the protein addresses and make them easy to access on the cell surface. Methods: We generated yeast populations that each displayed a distinct protein on their cell surfaces. These proteins included binding partners for cognate antibodies as well as chromatin-associated proteins that bind post-translationally modified histone peptides. For each specific yeast population, we transformed a library of hundreds of DNA sequences collectively encoding a specific image file. Results: We first demonstrated that the yeast retained file-encoded DNA through multiple cell divisions without a noticeable skew in their distribution or a loss in file integrity. Second, we showed that the physical act of sorting yeast displaying a specific file address was able to recover the desired data without a loss in file fidelity. Finally, we showed that analog addresses can be achieved by using addresses that have overlapping binding specificities for target peptides. Conclusions: These results motivate further exploration into the advantages proteins may confer in molecular information storage.
more » « less
Full Text Available
Nanopore decoding with speed and versatility for data storage

https://doi.org/10.1093/bioinformatics/btaf006

Volkel, Kevin D; Hook, Paul W; Keung, Albert; Timp, Winston; Tuck, James M (December 2024, Bioinformatics)
Mathelier, Anthony (Ed.)
Abstract MotivationAs nanopore technology reaches ever higher throughput and accuracy, it becomes an increasingly viable candidate for reading out DNA data storage. Nanopore sequencing offers considerable flexibility by allowing long reads, real-time signal analysis, and the ability to read both DNA and RNA. We need flexible and efficient designs that match nanopore’s capabilities, but relatively few designs have been explored and many have significant inefficiency in read density, error rate, or compute time. To address these problems, we designed a new single-read per-strand decoder that achieves low byte error rates, offers high throughput, scales to long reads, and works well for both DNA and RNA molecules. We achieve these results through a novel soft decoding algorithm that can be effectively parallelized on a GPU. Our faster decoder allows us to study a wider range of system designs. ResultsWe demonstrate our approach on HEDGES, a state-of-the-art DNA-constrained convolutional code. We implement one hard decoder that runs serially and two soft decoders that run on GPUs. Our evaluation for each decoder is applied to the same population of nanopore reads collected from a synthesized library of strands. These same strands are synthesized with a T7 promoter to enable RNA transcription and decoding. Our results show that the hard decoder has a byte error rate over 25%, while the prior state of the art soft decoder can achieve error rates of 2.25%. However, that design also suffers a low throughput of 183 s/read. Our new Alignment Matrix Trellis soft decoder improves throughput by 257× with the trade-off of a higher byte error rate of 3.52% compared to the state of the art. Furthermore, we use the faster speed of our algorithm to explore more design options. We show that read densities of 0.33 bits/base can be achieved, which is 4× larger than prior MSA-based decoders. We also compare RNA to DNA, and find that RNA has 85% as many error-free reads when compared to DNA. Availability and implementationSource code for our soft decoder and data used to generate figures is available publicly in the Github repository https://github.com/dna-storage/hedges-soft-decoder (10.5281/zenodo.11454877). All raw FAST5/FASTQ data are available at 10.5281/zenodo.11985454 and 10.5281/zenodo.12014515.
more » « less
Full Text Available
A primordial DNA store and compute engine

https://doi.org/10.1038/s41565-024-01771-6

Lin, Kevin N; Volkel, Kevin; Cao, Cyrus; Hook, Paul W; Polak, Rachel E; Clark, Andrew S; San_Miguel, Adriana; Timp, Winston; Tuck, James M; Velev, Orlin D; et al (August 2024, Nature Nanotechnology)

There are a set of primordial features and functions expected of any modern information system: a substrate stably carrying data; the ability to repeatedly write, read, erase, reload, and compute on specific data from that substrate; and the overall ability to execute such functions in a seamless and programmable manner. For nascent molecular information technologies, proof of principle realization of this set of primordial capabilities would advance the vision for their continued development. Here, we present a DNA-based store and compute engine that captures these primordial capabilities. This system comprises multiple image files encoded into DNA and adsorbed onto ~50 um diameter, highly porous, hierarchically branched, colloidal substrate particles comprised of naturally abundant cellulose acetate. Their surface areas are over 200 cm2/mg with binding capacities of over 1012 DNA oligos/mg, 10 terabytes/mg, or 104 terabytes/cm3. This “dendricolloid” stably holds DNA files better than bare DNA with an extrapolated ability to be repeatedly lyophilized and rehydrated over 170 times compared to 60 times, respectively. Accelerated aging studies project half-lives of ~6000 and 2 million years at 4 ˚C and -18 ˚C, respectively. The data can also be erased and replaced, and non-destructive file access is achieved through transcribing from distinct synthetic promoters. The resultant RNA molecules can be directly read via nanopore sequencing and can also be enzymatically computed to solve simplified 3x3 chess and sudoku problems. Our study establishes a feasible route for utilizing the high information density and parallel computational advantages of nucleic acids.
more » « less
Full Text Available
A molecular assessment of the practical potential of DNA-based computation

https://doi.org/10.1016/j.copbio.2023.102940

Polak, Rachel E; Keung, Albert J (June 2023, Current Opinion in Biotechnology)

Full Text Available
FrameD: Framework for DNA-based Data Storage Design, Verification, and Validation

https://doi.org/10.1093/bioinformatics/btad572

Volkel, Kevin D; Lin, Kevin N; Hook, Paul W; Timp, Winston; Keung, Albert J; Tuck, James M (October 2023, Bioinformatics)
Kelso, Janet (Ed.)
Abstract Motivation DNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components. Results We demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using Multiple Sequence Alignment (MSA) algorithms and others that do not. We found that the choice to include MSA in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems. Availability and implementation The source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed (10.5281/zenodo.7757762)
more » « less
Full Text Available
Chaetocin disrupts the SUV39H1–HP1 interaction independent of SUV39H1 methyltransferase activity

https://doi.org/10.1042/BCJ20220528

Han, Linna; Lee, Jessica B.; Indermaur, Elaine W.; Keung, Albert J. (March 2023, Biochemical Journal)

Chemical tools to control the activities and interactions of chromatin components have broad impact on our understanding of cellular and disease processes. It is important to accurately identify their molecular effects to inform clinical efforts and interpretations of scientific studies. Chaetocin is a widely used chemical that decreases H3K9 methylation in cells. It is frequently attributed as a specific inhibitor of the histone methyltransferase activities of SUV39H1/SU(VAR)3–9, although prior observations showed chaetocin likely inhibits methyltransferase activity through covalent mechanisms involving its epipolythiodixopiperazine disulfide ‘warhead’ functionality. The continued use of chaetocin in scientific studies may derive from the net effect of reduced H3K9 methylation, irrespective of a direct or indirect mechanism. However, there may be other molecular impacts of chaetocin on SUV39H1 besides inhibition of H3K9 methylation levels that could confound the interpretation of past and future experimental studies. Here, we test a new hypothesis that chaetocin may have an additional downstream impact aside from inhibition of methyltransferase activity. Using a combination of truncation mutants, a yeast two-hybrid system, and direct in vitro binding assays, we show that the human SUV39H1 chromodomain (CD) and HP1 chromoshadow domain (CSD) directly interact. Chaetocin inhibits this binding interaction through its disulfide functionality with some specificity by covalently binding with the CD of SUV39H1, whereas the histone H3–HP1 interaction is not inhibited. Given the key role of HP1 dimers in driving a feedback cascade to recruit SUV39H1 and to establish and stabilize constitutive heterochromatin, this additional molecular consequence of chaetocin should be broadly considered.
more » « less
Full Text Available
DINOS: Data INspired Oligo Synthesis for DNA Data Storage

https://doi.org/10.1145/3510853

Volkel, Kevin; Tomek, Kyle J.; Keung, Albert J.; Tuck, James M. (July 2022, ACM Journal on Emerging Technologies in Computing Systems)

As interest in DNA-based information storage grows, the costs of synthesis have been identified as a key bottleneck. A potential direction is to tune synthesis for data. Data strands tend to be composed of a small set of recurring code word sequences, and they contain longer sequences of repeated data. To exploit these properties, we propose a new framework called DINOS. DINOS consists of three key parts: (i) The first is a hierarchical strand assembly algorithm, inspired by gene assembly techniques that can assemble arbitrary data strands from a small set of primitive blocks. (ii) The assembly algorithm relies on our novel formulation for how to construct primitive blocks, spanning a variety of useful configurations from a set of code words and overhangs. Each primitive block is a code word flanked by a pair of overhangs that are created by a cyclic pairing process that keeps the number of primitive blocks small. Using these primitive blocks, any data strand of arbitrary length can be assembled, theoretically. We show a minimal system for a binary code with as few as six primitive blocks, and we generalize our processes to support an arbitrary set of overhangs and code words. (iii) We exploit our hierarchical assembly approach to identify redundant sequences and coalesce the reactions that create them to make assembly more efficient. We evaluate DINOS and describe its key characteristics. For example, the number of reactions needed to make a strand can be reduced by increasing the number of overhangs or the number of code words, but increasing the number of overhangs offers a small advantage over increasing code words while requiring substantially fewer primitive blocks. However, density is improved more by increasing the number of code words. We also find that a simple redundancy coalescing technique is able to reduce reactions by 90.6% and 41.2% on average for decompressed and compressed data, respectively, even when the smallest data fragments being assembled are 16 bits. With a simple padding heuristic that finds even more redundancy, we can further decrease reactions for the same operating point up to 91.1% and 59% for decompressed and compressed data, respectively, on average. Our approach offers greater density by up to 80% over a prior general purpose gene assembly technique. Finally, in an analysis of synthesis costs in which we make 1 GB volume using de novo synthesis versus making only the primitive blocks with de novo synthesis and otherwise assembling using DINOS, we estimate DINOS as 10 5 × cheaper than de novo synthesis.
more » « less
Full Text Available
Evaluation of UBE3A antibodies in mice and human cerebral organoids

https://doi.org/10.1038/s41598-021-85923-x

Sen, Dilara; Drobna, Zuzana; Keung, Albert J. (December 2021, Scientific Reports)
null (Ed.)
Abstract UBE3A is an E3 ubiquitin ligase encoded by the neurally imprinted UBE3A gene. The abundance and subcellular distribution of UBE3A has been the topic of many previous studies as its dosage and localization has been linked to neurodevelopmental disorders including Autism, Dup15q syndrome, and Angelman syndrome. While commercially available antibodies have been widely employed to determine UBE3A localization, an extensive analysis and comparison of the performance of different UBE3A antibodies has not been conducted. Here we evaluated the specificities of seven commercial UBE3A antibodies in two of the major experimental models used in UBE3A research, mouse and human pluripotent stem cell-derived neural cells and tissues. We tested these antibodies in their two most common assays, immunofluorescence and western blot. In addition, we also assessed the ability of these antibodies to capture dynamic spatiotemporal changes of UBE3A by utilizing human cerebral organoid models. Our results reveal that among the seven antibodies tested, three antibodies demonstrated substantial nonspecific immunoreactivity. While four antibodies show specific localization patterns in both mouse brain sections and human cerebral organoids, these antibodies varied significantly in background signals and staining patterns in undifferentiated human pluripotent stem cells.
more » « less
Full Text Available
Modified Histone Peptides Linked to Magnetic Beads Reduce Binding Specificity

https://doi.org/10.3390/ijms23031691

Meanor, Jenna N.; Keung, Albert J.; Rao, Balaji M. (February 2022, International Journal of Molecular Sciences)

Histone post-translational modifications are small chemical changes to the histone protein structure that have cascading effects on diverse cellular functions. Detecting histone modifications and characterizing their binding partners are critical steps in understanding chromatin biochemistry and have been accessed using common reagents such as antibodies, recombinant assays, and FRET-based systems. High-throughput platforms could accelerate work in this field, and also could be used to engineer de novo histone affinity reagents; yet, published studies on their use with histones have been noticeably sparse. Here, we describe specific experimental conditions that affect binding specificities of post-translationally modified histones in classic protein engineering platforms and likely explain the relative difficulty with histone targets in these platforms. We also show that manipulating avidity of binding interactions may improve specificity of binding.
more » « less
Full Text Available
DNA stability: a central design consideration for DNA data storage systems

https://doi.org/10.1038/s41467-021-21587-5

Matange, Karishma; Tuck, James M.; Keung, Albert J. (December 2021, Nature Communications)
null (Ed.)
Abstract Data storage in DNA is a rapidly evolving technology that could be a transformative solution for the rising energy, materials, and space needs of modern information storage. Given that the information medium is DNA itself, its stability under different storage and processing conditions will fundamentally impact and constrain design considerations and data system capabilities. Here we analyze the storage conditions, molecular mechanisms, and stabilization strategies influencing DNA stability and pose specific design configurations and scenarios for future systems that best leverage the considerable advantages of DNA storage.
more » « less
Full Text Available

« Prev Next »